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Abstract 



The sequential parameter optimization (spot) package for R ( R De- 
[velopment Core Teamj ^2008 ) is a toolbox for tuning and understanding 
simulation and optimization algorithms. Model-based investigations are 
common approaches in simulation and optimization. Sequential parame- 
ter optimization has been developed, because there is a strong need for 
sound statistical analysis of simulation and optimization algorithms. SPOT 
includes methods for tuning based on classical regression and analysis of 
variance techniques; tree-based models such as CART and random forest; 
Gaussian process models (Kriging), and combinations of different meta- 
modeling approaches. This article exemplifies how SPOT can be used for 
automatic and interactive tuning. 



1 Introduction 

This article illustrates the functions of the SPOT package. The SPOT pack- 
age can be downloaded from the comprehensive R archive network at http :J 



//CRAN.R-project .org/package=SPDT, SPOT is one possible implementation 



of the sequential parameter optimization (SPG) framework introduced in Bartz 



Beielstein (2006). For a detailed documentation of the functions from the SPOT 
package, the reader is referred to the package help manuals. 

The performance of modern search heuristics such as evolution strategies (ES), 
differential evolution (DE), or simulated annealing (SANN) relies crucially on 
their parametrizations — or, statistically speaking, on their factor settings. The 
term algorithm design summarizes factors that influence the behavior (perfor- 
mance) of an algorithm, whereas problem design refers to factors from the op- 
timization (simulation) problem. Population size in ES is one typical factor 
which belongs to the algorithm design, the search space dimension belongs to 
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the problem design. We will consider SANN in the remainder of this article, 
because it requires the specification of two algorithm parameters only. 

One interesting goal of SPO is to detect the importance of certain parts 
(subroutines such as recombination in ES) by systematically varying the factor 
settings of the algorithm design. This goal is related to improving the algo- 
rithm's efficiency and will be referred to in the following as algorithm tuning, 
where the experimenter is seeking for an improved parameter setting, say p* , for 
one problem instance. Varying problem instances, e.g., search space dimensions 
or starting points of the algorithm, are associated with effectivity or the algo- 
rithm's robustness. In this case, the experimenter is interested in one parameter 
setting of the algorithm with which the algorithm performs sufficiently good on 
several problem instances. SPOT can be applied for both tasks. The focus of 
this article lies on improving the algorithm's efficiency. 

Besides an improved performance of the algorithm, SPO may lead to a 
better understanding of the algorithm. SPO combines several techniques from 
classical and modern statistics, namely design of experiments (DoE) and design 
and analysis of computer experiments (DACE) ( Bartz-Beielstein[ 2006). Basic 
ideas from SPO rely heavily on Kleijnen's work on statistical techniques in 



1987 2008). 



simulation (Kleijnen 

Note, that we do not claim that SPO is the only suitable way for tuning 
algorithms. Far from it! We state that SPO presents only one possible way — 
which is possibly not the best for your specific problem. We highly recommend 
other approaches in this field, namely F-race ( [BirattarT 20051, ParamlLS (Hut 



ter et al. 20091, and REVAC (Nannen 2009 



The paper is structured as follows: Section [2] presents an introductory exam- 
ple which illustrates the use of tuning. The sequential parameter optimization 
framework is presented in Sect.[3j Details of the sequential parameter optimiza- 
tion toolbox are presented in Sect. [4] SPOT uses plugins. Typical plugins are 
discussed in Sect. [5] How SPOT can be refined is exemplified in Sect.[6j Section [7] 
presents a summary and an outlook. 



2 Motivation 

2.1 A Typical Situation 

We will discuss a typical situation from optimization. The practitioner is inter- 
ested in optimizing an objective function, say /, with an optimization algorithm 
A. She can use the optimization algorithm with default parameters. This may 
lead to good results in some cases, whereas in other situations results are not 
satisfactory. In the latter cases, practitioners try to determine improved pa- 
rameter settings for the algorithms manually, e.g., by changing one algorithm 
parameter at a time. Before we will discuss problems related to this approach, 
we will take a broader view and consider the general framework of optimization 
via simulation which occurs in many real-world optimization scenarios. 
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Figure 1: Optimization via simulation. Illustration taken from [Ziegenhirt et al.] 
(2010'), who describe how SPOT can be applied to the optimization of a biogas- 
simulation model. Four different layers of the optimization of a biogas simulation 
of shown. The first layer (LI) represents the real-world setting. Layer 2 (L2) 
shows the simulator. An objective function / is defined at this layer. The op- 
timization algorithm A belongs to the third layer (L3). The fourth layer (L4) 
represents the algorithm tuning procedure, e.g., sequential parameter optimiza- 
tion. 

2.2 Optimization via Simulation 
2.2.1 Modeling Layers 

To illustrate the task of optimization via simulation, the following layers can be 
used. 

(LI) The real- world system, e.g., a biogas plant. 

(L2) The related simulation model. The objective function / is defined at this 
layer. In optimization via simulation, problem parameters are defined at 
this layer. 

(L3) The optimization algorithm A. It requires the specification of algorithm 
parameters, say p* G P, where P denotes the set of parameter vectors. 

(L4) The experiments and the tuning procedure. 

Figure [l] illustrates the situation. To keep the setting as simple as possible, we 
consider an objective function / from layer (L2) and do not discuss interactions 
between (LI) and (L2). Defining the relationship between (LI) and (L2) is 
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Figure 2: Plots of the Branin function. The contour plot (left) shows the location 
of the three global minima 



not a trivial task, 
introduction. 



The reader is referred to Law ( 2007 ) and Fu ( 2002 1 for an 



2.3 Description of the Objective Function 

The Branin function 
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cos(a;i) + 10, 



with 



xi e [-5, 10] and X2 e [0, 15]. 



(1) 



was chosen as a test function, because it is well-known in the global opti- 
mization community, so results are comparable. It has three global minima, 
xl = [3.1416,2.2750], x*2 = [9.4248, 2.4750]_and x^ = [-3.1416,12.2750] with 

+ 

y 



fix*) = 0.39789, {i = 1,2,3), see Fig. [2) 



2.4 Description of the Optimization Algorithm 

In order to improve reproducibility of the examples presented in this article, 
an algorithm which is an integral part of the R system, the method SANN, 



was chosen. It is described in R's help system as follows (R Development Core 



Team 2008): Method SANN is by default a variant of simulated annealing 



given in 'Belisle] ( 1992 ). Simulated annealing belongs to the class of stochastic 



global optimization methods. It uses only function values but is relatively slow. 
It will also work for non-differentiable functions. This implementation uses the 
Metropolis function for the acceptance probability. By default the next candi- 
date point is generated from a Gaussian Markov kernel with scale proportional 



to the actual temperature. If a function to generate a new candidate point is 
given, method SANN can also be used to solve combinatorial optimization prob- 
lems. Temperatures are decreased according to the logarithmic cooling schedule 
as given in 



Belisle (1992); specifically, the temperature is set to 



temp / log(((t-l) tmax)*tmaLX + exp(l)) 

where t is the current iteration step and temp and tmax are specifiable via 
control. Note that the SANN method depends critically on the settings of the 
control parameters. Summarizing, there are two algorithm parameters which 
have to be specified before the algorithm is run: 

1. temp controls the SANN method. It is the starting temperature for the 
cooling schedule. Defaults to 10. 

2. tmax is the number of function evaluations at each temperature for the 
SANN method. Defaults to 10. 

Note, tmax is an integer. How different parameter types can be handled is de- 



scribed in Sec. 6.2 To simply the discussion, temp will be treated as a numerical 



value in the remainder of this article. 



2.5 Starting Optimization Runs 

Now we discuss the typical situation from optimization: An experimenter ap- 
plies an optimization algorithm A (SANN) to an objective function / (Branin 
function) in order to determine the minimum. 

First, we will set the seed to obtain reproducible results. 

> set . seed(l) 

Next, we will define the objective function. 

> spotFunctionBranin <- function(x) { 
+ xl <- x[l] 

+ x2 <- x[2] 

+ (x2 - 5.1/(4 * pi'2) * (xl'2) + 5/pi * xl - 6)~2 + 10 * (1 - 

+ 1/(8 * pi)) * cos(xl) + 10 

+ } 

Then, the starting point for the optimization, xq, and the number of function 
evaluations, maxit, are defined: 

> xO <- c(10, 10) 

> maxit <- 250 

The parameters specified so far belong to the problem design. Now we have to 
consider parameters from the algorithm design, i.e., parameters that control the 
behavior of the SANN algorithm, namely tmax and temp. Default values are 
chosen first: 
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> tmax <- 10 

> temp <- 10 

Finally, we can start the optimization algorithm (SANN): 

> yl <- optimCxO, spotFunctionBranin, method = "SANN", control = 
+ listCmaxit = maxit, temp = temp, tmax = tmax)) 

SANN returns the following result: 

> print (yl$value) 
[1] 4.067359 

Since the optimum value reads y* — 0.39789, the practitioner is interested in 
improving this result by modifying the algorithm parameters tmax and temp: 

> tmax <- 10 

> temp <- 20 

> y2 <- optimCxO , spotFunctionBranin, method = "SANN", 

+ control = list (maxit = maxit, temp = temp, tmax = tmax)) 

Results obtained with the new tmax and temp values look promising: 

> print (y2$value) 
[1] 0.4570975 

However, since SANN is a stochastic algorithm, the practitioner wants to in- 
vestigate the dependency of the results on the random seed. So she performs 
the same experiment with modified seed. 

> set.seed(lOOO) 

> y3 <- optimCxO, spotFunctionBranin, method = "SANN", 

+ control = listCmaxit = maxit, temp = temp, tmax = tmax)) 

> print Cy3$value) 

[1] 7.989125 

This result is rather disappointing, because a worse value is obtained with this 
seemingly better parameter settings. Results from these experiments arc sum- 
marized in Tab. [ij 

The practitioner has modified one variable (temp) only. Introducing vari- 
ations of the second variable (tmax) complicates the situation, because inter- 
actions between these two variables might occur. And, the experimenter has 
to take random effects into account. Here comes SPOT into play. SPOT was 
developed for tuning algorithms in a reproducible way. It uses results from 
algorithm runs to build up a meta model. This meta model enables the ex- 
perimenter to detect important input variables, estimate effects, and determine 
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Table 1: Results from manually tuning SANN on Branin function. Smaller 
values are better. Run 1 reports results from the default configuration. Run 
2 uses a different temperature and obtains a better function value. However, 
this result cannot be generalized, because modifying the seed leads to a worse 
function value 

run temp tmax seed result 

10 10 i 4.067359 

2 20 10 1 0.4570975 

3 20 10 1000 7.989125 



improved algorithm configurations in a reproducible manner. Last but not least, 
the experimenter learns from these results. 

One simple goal, which can be tackled with SPOT, is to determine the best 
parameter setting of the optimization algorithm for one specific instance of an 
optimization problem. It is not easy to define the term "best", because it can 



be defined in many ways and this definition is usually problem specific. Klein 



(20021 presents interesting aspects from practice. See also the discussion in 



chapter 7 of Bartz-Beielstein (20061. Therefore, we will take a naive approach 



by defining our tuning goal as the following hypothesis: 

(H-1) "We can determine a parameter setting p* which improves SANN's per- 
formance. To measure this performance gain, the average function values 
from ten runs of SANN with default, i.e., and tuned parameter p* 
settings are compared." 



2.6 Tuning with SPOT 

Before SPOT is described in detail, we will demonstrate how it can be applied 
to find an answer for hypothesis (H-1). 



2.6.1 SPOT Projects 

A SPOT project consists of a set of files with the same basename, but different 
extensions, namely CONF, ROI, and APD. Here, we will discuss the project 
demoOTRandomForestSann, which is included in the SPOT package, see 

> demo (paiCkaLge=" SPOT") 

for demos in the SPOT package. Demo projects, which are included in the SPOT 
package, can be found in the directory of your local SPOT installation, e.g., 
[VRi486-pc-linux-gnu-library/2 . 11/SPOT on Linux systems. 

A configuration (CONF) file, which stores information about SPOT specific 
settings, has to be set up. For example, the number of SANN algorithm runs, 
i.e., the available budget, can be specified via auto. loop. nevals. SPOT imple- 
ments a sequential approach, i.e., the available budget is not used in one step. 
Evaluations of the algorithm on a subset of this budget, the so-called initial 
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design, is used to generate a coarse grained meta model F. This meta model 
is used to determine promising algorithm design points which will be evaluated 
next. Results from these additional SANN runs are used to refine the meta 
model F. The size of the initial design can be specified via init . design, size. 
To generate the meta model, we use random forest (jBreiman 2001|). This 
can be specified via seq.predictionModel.func = "spotPredictRandomFor- 
est". Available meta models are listed in Sect. 15.31 Random forest was 
chosen, because it is a robust method which can handle categorical and nu- 
merical variables. In the following example, we will use the configuration file 
demo07RandomForestSann. conf . 

A region of interest (ROI) file specifies algorithm parameters and associated 
lower and upper bounds for the algorithm parameters. Values for temp are 
chosen from the interval [1;50]. TEMP 1 50 FLOAT is the corresponding line for 
the temp parameter which is added to the file demoOTRandomForestSann.roi. 

Optionally, an algorithm problem design (APD) file can be specified. This 
file contains information about the problem and might be used by the algorithm. 
For example, the starting point xO = c(10,10) can be specified in the APD 
file. The file demo07RandomForestSann. apd will be used in our example. 



2.6.2 Starting SPOT in Automatic Mode 

If these files are available, spot can be started from R's command line via 

> lihraLry(SPOT) 

> spot C ' demoOTRandomForestSaim . conf" ) 

SPOT is run in automatic mode, if no task is specified (this is the default setting). 
Result from this run reads 

Best solution found with 236 evaluations: 

Y TEMP TMAX COUNT CONFIG 

0.3992229 1.283295 41 10 36 

SPOT has determined a configuration temp = 1.283295 and tmax = 41, which 
gives an average function value from ten runs oty — 0.3998429. SPOT uses an in- 
ternal counter (COUNT) for configurations. The best solution was found with 
configuration 36. The tuning process is illustrated in Fig. |3] Figure |4] shows a 
regression tree which is generated by the default report function spotReport- 
Def ault. 



2.7 Validating the Results 

Finally, we will evaluate the result by comparing ten runs of SANN with default 
parameter settings, say jp to ten runs with the tuned configurations from SPOT, 
say p*. The corresponding R commands used for this comparison are shown 
in the Appendix. First, we will set the seed to obtain reproducible results. 
Next, we will define the objective function. Then, the starting point for the 
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Eval: 236, Y: 0.399222931305286 
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Figure 3: Tuning SANN with SPOT. Random forest was chosen as a meta 
model. This output can also be shown on-line, i.e., during the tuning process in 
order to visualize the progress. The first panel shows the average function value 
of the best configuration found so far. The second and third panel visualize 
corresponding parameter settings. These values are updated each time a new 
meta model (random forest) is build. Each time a meta model is build, the 
step counter is increased. Altogether 14 meta models (random forest) are build 
during this tuning process and 236 runs of the SANN algorithm were executed 
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Figure 4: Tuning SANN with SPOT. Random forest was chosen as a meta 
model. This simple regression tree is generated by spot's default report func- 
tion spotReportDef ault. The tree illustrates that temp has the largest effect. 
Values at the terminal node ti show the average function value and the number 
of observations (obs) which fulfill the conditions which are given by following 
the tree from the root node to ti. Smaller temp values improve SANN's per- 
formance. A value of temp, which is smaller than 9.14, results in an average 
function value of y = 0.408. This result is based on 213 observations 
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Figure 5: Comparison of SANN's default parameter values (default) with pa- 
rameter settings obtained with SPOT, where random forest was chosen as a meta 
model (spotRf) 

optimization xq and the number of function evaluations maxit are defined. 
The parameters specified so far belong to the problem design. 

Now we have to consider parameters from the algorithm design, i.e., param- 
eters that control the behavior of the SANN algorithm, namely tmax and temp. 
Finally, we can start the optimization algorithm (SANN). The run is finished 
with the following summary: 

Min. 1st Qu. Median Mean 3rd Qu. Max. 
0.3995 0.4037 0.4174 0.9716 0.6577 4.0670 

In order to illustrate the performance gain from spot's tuning procedure, 
SANN is run with the tuned parameter configuration, i.e., temp — 1.283295 and 
tmeix = 41. Results from these ten SANN runs can be summarized as follows: 

Min. 1st Qu. Median Mean 3rd Qu. Max. 
0.3980 0.3982 0.3984 0.3995 0.3989 0.4047 

Further statistical analyses, e.g., the box plot shown in Fig. [5] reveal that 
this difference is statistically significant. Hence, hypothesis (H-1) cannot be 
rejected. After this quick introduction we will have a closer look at SPOT. 

3 Sequential Parameter Optimization 
3.1 Definition 

Definition 3.1 (Sequential Parameter Optimization) Sequential param- 
eter optimization (SPO ) is a framework for tuning and understanding of algo- 
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Figure 6: Steps and contexts of performing an experiment from research question 
to scientific result 



rithms by active experimentation. SPO employs methods from error statistics 
to obtain reliable results. It comprises the following elements: 

SPO-1: Scientific questions SPO-3: Experiments 

SPO-2: Statistical hypotheses SPO- 4: Scientific meaning 

□ 

These elements can be explained as follows. Starting point of the investiga- 
tion is a scientific question (SPO-1). This question often deals with assumptions 
about algorithms, e.g., influence of parameter values or new operators. This 
(complex) question is broken down into (simple) statistical hypotheses (SPO-2) 



for testing, see Bartz-Beielstein (2008) for an example. Next, experiments can 



be performed for each hypothesis, e.g., (H-1) as defined in Sect. 2.5 

a) Select a model F (e.g., random forest) to describe a functional relationship. 

b) Select an experimental design, e.g., Latin hypercube design. 

c) Generate data, i.e., perform experiments. 

d) Refine the model until the hypothesis can be accepted/rejected. 

Performing these experiments will be referred to as step (SPO-3). Finally, to 
assess the scientific meaning of the results from an experiment, conclusions are 
drawn from the hypotheses. This is step (SPO-4) in the sequential parame- 
ter optimization framework, see Definition |3.1| Figure [6] illustrates the SPO 
framework. SPOT implements the steps from the statistical context. 

This article describes one specific instance of (SPO-3), which implements the 
corresponding software programs in R. It will be referred to as SPOT. 
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3.2 Sequential Parameter Optimization Toolbox 

We introduce R's SPOT package as one possible implementation of step (SPO-3) 
from the SPO framework. Implementations in other programming languages, 
e.g., MATLAB, are also available but are not subject of this article. 

The SPO toolbox was developed over recent years by Thomas Bartz-Beielstein, 



Christian Lasarczyk, and Mike Preuss ( Bartz-Beielstein et al. 2005 1 . Main goals 



of SPOT are (i) the determination of improved parameter settings for optimiza- 
tion algorithms and (ii) to provide statistical tools for analyzing and under- 
standing their performance. 

Definition 3.2 (Sequential Parameter Optimization Toolbox) The sequen- 
tial parameter optimization toolbox implements the following features, which are 
related to step (SPO-3) from the SPO framework. 

SPOT-1: Use the available budget (e.g., simulator runs, number of function 
evaluations) sequentially, i.e., use information from the exploration 
of the search space to guide the search by building one or several meta 
models. Choose new design points based on predictions from the meta 
model(s). Refine the meta model(s) stepwise to improve knowledge 
about the search space. 

SPOT-2: If necessary, try to cope with noise by improving confidence. Guar- 
antee comparable confidence for search points. 

SPOT-3: Collect information to learn from this tuning process, e.g., apply ex- 
ploratory data analysis. 

SPOT-^: Provide mechanisms both for interactive and automatic tuning. 

□ 



The article entitled "sequential parameter optimization" ( Bartz-Beielstein et al 



2005 1 was the first attempt to summarize results from seminars and tutorials 



given at conferences such as CEC and GECCO and make this approach known 



to and available for a broader audience (Beielstein 2002 



PreuB 2004 Bartz-Beielstein 2005 Bartz-Beielstein and Preufi 



Bartz-Beielstein and 
2005aP 



SPOT was successfully applied in the fields of bioinformatics ( Volkert , 2006 



Fober et al. 2009), environmental engineering (Konen et al. 



20101, shipbuilding (Rudolph et al. 20091, fuzzy logic (Yi 



2009 



2008 



Flasch et al. 



multimodal 



optimization (Preuss et al. 20071, statistical analysis of algorithms (Lasar- 
czyk 2007 Trautmann and Mehnen 2009), multicriteria optimization dBartz 



Beielstein et al. 2009), genetic programming ( Lasarczyk and Banzhafj 



particle swarm optimization ( Bartz-Beielstein et al. 2004 Kramer et al 



automatic and manual parameter tuning (Fober 



and shipbuilding industry (Naujoks et al 



et al. 2007), and chemical engineering (Henrich et al. 20081. Bartz-Beielstein 



2006 Smit and Eiben 



Hutter et alT||2010a|b[ ), gr aph drawing (|Tosic[|2006 



Pothmann 2007 1 , aerospace 



2005), 



2007), 



2009 



2006 1 , mechanical engineering ( Mehne: i 



(2010) collects more than 100 publications related to the sequential parameter 



optimization. 
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Algorithm 1: SPOT 



// phase 1, building the model: 

1 let A be the tuned algorithm; 

2 generate an initial population P = {p^ , . . . ,p™} of m parameter vectors; 

3 let = fco be the initial number of tests for determining estimated utilities; 

4 foreach p* G P do 
run A with k times to determine the estimated utility (e.g., average 
function value from 10 runs) of p"; 

// phase 2, using and improving the model: 
6 while termination criterion not true do 

let p" denote the parameter vector from P with best estimated utility; 
let k the number of repeats already computed for p* ; 

build prediction model F based on P and u^, . . . , ii'^'}; 
generate a set P' of I new parameter vectors by random sampling; 
foreach p' £ P' do 
1^ calculate f{p') to determine the predicted utility F{p^); 

select set P" of d parameter vectors from P' with best predicted utility 

run A with p" once and recalculate its estimated utility using all A; + 1 test 
results; // (improve confidence) 

15 update k, e.g., let fc = A; + 1; 

16 run A k times with each p* £ P" to determine the estimated utility F{p^); 

17 extend the population by P = P U P"; 
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3.3 Elements of the SPOT Framework 
3.3.1 The General SPOT Scheme 

Algorithm[l]presents a formal description of the SPOT scheme. The utility is used 
to measure algorithm's performance. Typical measures are the estimated mean 
or median from several runs of A. Algorithm [T] consists of two phases, namely 
the first construction of the model (lines Il]-p]) and its sequential improvement 



(lines 17-17). Phase 1 determines a population of initial designs in algorithm 
parameter space and runs the algorithm k times for each design. Phase 2 consists 
of a loop with the following components: 

1. Update the meta model F (or several meta models Fi) by means of the 
obtained data. 

2. Generate a (large) set of design points and compute their utility by sam- 
pling the model. 

3. Select the seemingly best design points and run the algorithm for these. 

4. The new design points are added to the population and the loop starts 
over if the termination criterion is not reached. 
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A counter k is increased in each cycle and used to determine the number of 
repeats that are performed for each setting to be statistically sound in the ob- 
tained results. In consequence, this means that the best design points so far 
are also run again to obtain a comparable number of repeats. These reevalua- 
tions may worsen the estimated performance and explains increasing Y values 
in Fig. [31 

Sequential approaches can be more efficient than approaches that evaluate 
the information in one step only (Wald 1947). This presumes an experienced 
operator who is able to draw the right conclusions out of the first results. In case 
the operator is new to SPOT the sequential steps can be started automatically. 
Compared to interactive procedures, performance in the automatic tuning pro- 
cess may decrease. However, results from different algorithm runs, e.g., ES and 
SANN, will be comparable in an objective manner if data for the comparison 
is based on the same tuning procedure. 

Extensions to the SPOT approach were proposed by other authors, e.g., [Las ar^ 
czyk ( |2007| ) integrated an optimal computational budget allocation procedure, 



which is based on ideas by Chen et al. (2003). Due to spot's plugin structure, 
see Sect. [5] further extensions can easily be integrated. 



3.3.2 Running SPOT 

In Sect. |2.6.2[ SPOT was run as an automatic tuner. Steps from the automatic 
mode can be used in an interactive manner. SPOT can be started with the 
command 

spot (<conf igurationf ile> , <task>) 

where conf igurationf ile is the name of the SPOT configuration file and task 
can be one of the tasks init, seq, run, rep or auto. SPOT can also be run in a 
meta mode to perform tuning over a set of problem instances. 



Files Used During the Tuning Process Each configuration file belongs to 
one SPOT project, if the same basename is used for corresponding files. SPOT 
uses simple text files as interfaces from the algorithm to the statistical tools. 

1. The user has to provide the following files: 

(i) Region of interest (ROI) files specify the region over which the al- 
gorithm parameters are tuned. Categorical variables such as the 
recombination operator in ES, can be encoded as factors, e.g., "in- 
termediate recombination" and "discrete recombination." 

(ii) Algorithm design (APD) files are used to specify parameters used by 
the algorithm, e.g., problem dimension, objective function, starting 
point, or initial seed. 

(iii) Configuration files (CONF) specify SPOT specific parameters, such 
as the prediction model or the initial design size. 
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2. SPOT will generate the following files: 

(i) Design files (DES) specify algorithm designs. They are generated 
automatically by SPOT and will be read by the optimization algo- 
rithms. 

(ii) After the algorithm has been started with a parametrization from 
the algorithm design, the algorithm writes its results to the result 
file (RES). Result files provide the basis for many statistical evalu- 
ations/visualizations. They are read by spot to generate prediction 
models. Additional prediction models can easily be integrated into 

SPOT. 

Figure [7] illustrates SPOT interfaces and the data flow. The acronym EDA (ex- 
ploratory data analysis) summarizes additional information that can be used 
to add further statistical tools. For example, SPOT writes a best file (EST), 
which summarizes information about the best configuration during the tuning 
process. Note, that the problem design can be modified, too. This can be done 
to analyze the robustness (effectivity) of algorithms. 

SPOT Tasks spot provides tools to perform the following tasks (see also 
Fig.[8]): 

1. Initialize. An initial design is generated. This is usually the first step 
during experimentation. The employed parameter region (ROI) and the 
constant algorithm parameters (APD) have to be provided by the user. 
spot's parameters are specified in the CONF file. Although it is rec- 
ommended to use the same basename for CONF, ROI, and APD files 
in order to define a project, this is not mandatory. SPOT allows a flexi- 
ble combination of different filenames, e.g., one APD file can be used for 
different projects. 

2. Run. This is usually the second step. The optimization algorithm is 
started with configurations of the generated design. Additionally infor- 
mation about the algorithms problem design are used in this step. The 
algorithm writes its results to the result file. 

3. Sequential step. A new design, based on information from the result file, 
is generated. A prediction model is used in this step. Several generic pre- 
diction models are available in SPOT by default. To perform an efhcient 
analysis, especially in situations when only few algorithms runs are possi- 
ble, user-specified prediction models can easily be integrated into SPOT. 

4. Report. An analysis, based on information from the result file, is generated. 
Since all data flow is stored in files, new report facilities can be added very 
easily. SPOT contains some scripts to perform a basic regression analysis 
and plots such as histograms, scatter plots, plots of the residuals, etc. 
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Figure 7: SPOT interfaces. The SPOT loop can be described as follows: Config- 
uration (CONF) and region- of -interest (ROI) files are read by SPOT (a). SPOT 
generates a design (DES) file (b). The algorithm reads the design file and (c) ex- 
tra information, e.g., about the problem dimension from the algorithm-problem 
design (APD) file (d). Output from the optimization algorithm are written to 
the result (RES) file (e). The result file is used by SPOT to build the predic- 
tion model (f). Data can be used by exploratory data analysis (EDA) tools to 
generate reports, statistics, visualizations, etc. (g) 
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5. Automatic mode. In the automatic mode, the steps run and sequential 
are performed after an initialization for a predetermined number of times. 



6. Meta mode. In the meta mode, the tuning process is repeated for several 
configurations. For example, tuning can be performed for different starting 
points xq, several dimensions, or randomly chosen problem instances. 



As stated in Sect. 3.2 



SPOT has been applied to several optimization tasks 
which might give further hints how SPOT can be used. Bartz-Beielstein |Bartz-| 
[Beiclstein and Preuss (2010); Bartz-Beielstein et al. ( 2010j ) present case studies 



that may serve as good starting points for SPOT applications. 



4 Details 

We will discuss functions which are used during the four SPOT steps initialize, 
run, sequential, and report. 

4.1 Initialize 

During this step, the initial design is generated and written to the design file. 
spotCreateDesignLhs, which is based on R's Ihs package, is recommended as 
a simple space filling design. 

Alternatively, factorial designs can be used. spotCreateDesignFrF2, which 
is based on Groemping's FrF2 package, see http : / / cran . r-pro j ect . org/w eb/] 
|packages/FrF2/ index . html, generates a fractional factorial design with center 
point. 

Furthermore, the number of initial design points, the type of the experimen- 
tal design etc. have to be specified before the first SPOT run is performed. These 
information are stored in the configuration /i/e(CONF), see Listing [T] 

Listing 1: demo07RandomForestSann.conf 

1 alg . func = " spot AlgStartSann " 

auto . loop . nevals = 100 
3 i n it . design . func = "spotCreateDesignLhs" 

i n it . design . size = 10 
5 i n it . design . repeat s = 2 

seq . predictionModel . func = " spotPredictRandomForest " 

The configuration file plays a central role in spot's tuning process. It stores 
information about the optimization algorithm (alg. func) and the meta model 
(seq. predictionModel. func). SPOT uses a classification scheme for its vari- 
ables: init refers to variables which were used during the initialization step, 
seq are variables used during the sequential step, and so forth. 

The experimental region is specified in the region of interest (ROI) file, see 
Listing [2] In the demo07RaiidomForestSaiin project, two numerical variables 
with values from the interval [1; 50] are used. SPOT employs a mechanism which 
adapts the region of interest automatically. Information about the actual region 
of interest are stored in the aroi file, which is handled by SPOT internally. 
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Figure 8: The SPOT process. White font color indicates steps used in the inter- 
active process only. A typewriter font indicates the corresponding SPOT com- 
mands. To start the automatic mode, simply use the task auto. Note that the 
interaction points are optional, so SPOT can be run without any user interaction, 
meta runs perform tuning processes for several configurations 
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Listing 2: demoOTRandomForestSann.roi 



name low high type 
2 TEMP 1 50 FLOAT 
TMAX 1 50 FLOAT 



Now all the source files are available. In order to generate the initial design, 
simply call SPOT as follows. 

spot ( "demo07RsLndomForestSann . conf", "init") 

The "init" call generates a design file (DES), which is shown in Listing [sj 



Listing 3: Design file demo07RandomForestSann.des generated by SPOT 



1 


TEMP 


TMAX CONFIG 


REPEATS STEP SEED 




35.6081542731496 


20.5193298289552 


1 


2 





1235 


3 


3.03074611076154 


30.9299647800624 


2 


2 





1235 




35.0430960887112 


11.6981793923769 


3 


2 





1235 


5 


18.7132237656275 


49.8082008753903 


4 


2 





1235 




13.9964890434407 


35.3148515065433 


5 


2 





1235 


7 


26.2654501354089 


25.7817166472087 


6 


2 





1235 




24.1049260527361 


3.67511987574399 


7 


2 





1235 


9 


7.34134425916709 


25.289775890857 


8 


2 





1235 




49.035177000449 


42.5625789203448 


9 


2 





1235 


11 


42.5434358732775 


7.34647449669428 


10 


2 





1235 



Since we have chosen the SPOT plugin spotCreateDesignLhs, a Latin hy- 
percube design is generated. Each configuration is labeled with a configura- 
tion number. The column REPEATS contains information from the variable 
init. design. repeats. Since no meta model has been created yet, STEP is set 
to for each configuration. Finally, the SEED, which is used by the algorithm, 
is shown in the last column. 

4.2 Run 

Parameters from the design file are read and the algorithm is executed. Each 
run results in one fitness value (single-objective optimization) or several values 
(multi-objective optimization). Fitness values with corresponding parameter 
settings are written to the result file. The user has to set up her own interface 
for her algorithm A. Examples are provided, see Sect. |5.2[ The command 

spot ( "demoOTRandomForestSaim . conf" , "run") 

executes the run task. Results from this run are written to the result file (HES)^ 
which is shown in Listing [4] 

Listing 4: demo07RandomForestSann.res 

1 Y TEMP TMAX FUNCTION DIM SEED CONFIG STEP 

3.30597157332377 35.6081542731496 21 BraninO.O 2 1235 1 

3 5.0386282545543 35.6081542731496 21 BraninO.O 2 1236 1 
0.400306954041771 3.03074611076154 31 BraninO.O 2 1235 2 

5 0.398257998573415 3.03074611076154 31 BraninO.O 2 1236 2 
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445520325383134 


35.0430960887112 


12 BraninO.O 


2 1235 3 





7 


2 


226649807247 35 


.0430960887112 12 


BraninO.O 2 


1236 3 









497684708317145 


18.7132237656275 


50 BraninO.O 


2 1235 4 





9 


1 


83495383747858 


18.7132237656275 


50 BraninO.O 


2 1236 4 









399119231011623 


13.9964890434407 


35 BraninO.O 


2 1235 5 





11 





542312919825205 


13.9964890434407 


35 BraninO.O 


2 1236 5 







2 


72322377892531 


26.2654501354089 


26 BraninO.O 


2 1235 6 




13 





417190547410852 


26.2654501354089 


26 BraninO.O 


2 1236 6 







4 


7533857289203 24.1049260527361 4 


BraninO.O 2 


1235 7 




15 


1 


61981739130332 


24.1049260527361 


4 BraninO.O 2 


1236 7 









403447629608046 


7.34134425916709 


25 BraninO.O 


2 1235 8 
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411160853072728 


7.34134425916709 


25 BraninO.O 


2 1236 8 







2 


98491602241286 


49.035177000449 43 BraninO.O 2 


1235 9 
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5 


24485760127901 


49.035177000449 43 BraninO.O 2 


1236 9 






6 


91947230812718 


42.5434358732775 


7 BraninO.O 2 


1235 10 
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521764744656569 


42.5434358732775 


7 BraninO.O 


2 1236 10 






4.3 Sequential 

Now that results have been written to the result file, the meta model can be 
build. 

spot ( "demoOTRandomForestSaim . conf " , "seq") 

The sequential call generates a new design file (DES), which is shown in List- 
ing [5| 

Listing 5: Design file demoOTRandomForestSann.des generated by SPOT 

1 TEMP TMAX CONFIG REPEATS repeatsLastConfig STEP SEED 

3.03074611076154 31 2 1 2 1 1237 
3 6.13796767716994 31.5014664068934 11 3 3 1 1235 

1.50800024462165 28.6290849421476 12 3 3 1 1235 



In order to improve confidence, the best solution found so far is evaluated again. 
To enable fair comparisons, new configurations are evaluated as many times as 
the best configuration found so far. Note, other update schemes are possible. 

If spot's budget is not exhausted, the new configurations are evaluated, i.e., 
run is called again, which updates the result file. In the following step, seq is 
called again etc. 

To support exploratory data analysis, SPOT also generates a best file, which 
is shown in Listing [6j 

Listing 6: Best file demoOTRandomForestSann.bst generated by SPOT 





Y TEMP TMAX COUNT CONFIG 








2 





398592091098704 


3 


03074611076154 


31 




2 2 







39950017406572 


1.8086370804091 31 3 




11 


4 





399448475332373 


1 


8086370804091 


31 


4 


11 







399732585200016 


1 


8086370804091 


31 


5 


11 


6 





399443742300062 


1 


8086370804091 


31 


6 


11 







400057920171103 


3 


03074611076154 


31 




3 2 


8 





400239513036257 


1 


8086370804091 


31 


7 


11 







400845187050665 


1 


8086370804091 


31 


9 


11 
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10 0.400892337114249 3.82960996863898 30 4 13 

0.401138794573396 1.8086370804091 31 10 11 

12 0.399842856102580 1.20055107064079 31 10 29 

0.399842856102580 1.20055107064079 31 10 29 

14 0.399842856102580 1.20055107064079 31 10 29 

The best file is updated after eacli SPOT iteration and be be used for an on-line 
visualization tool, e.g., to illustrate search progress or stagnation, see Fig. [3) 
The variable COUNT reports the number of REPEATS used for this specific 
configuration. 

4.4 Report 

If spot's termination criterion is fulfilled, a report is generated. By default, 
SPOT provides as simple report function which reads data from the res file and 
produces the following output: 

Best solution found with 223 evaluations: 

Y TEMP TMAX COUNT CONFIG 

0.3998429 1.200551 31 10 29 

4.5 Automatic 

spot's auto task performs steps init, run, seq, run, seq, etc. until the ter- 
mination criterion is fulfilled, see Fig. [8j It can be invoked from R's command 
line via 

spot ( "demo07RandomForestSann . conf", "auto") 

5 Plugins 

SPOT comes with a basic set of R functions for generating initial designs, starting 
optimization algorithms, building meta models, and generating reports. This 
set can easily be extended with user defined R functions, so called plugins. 
Further plugins will be added in forthcoming SPOT versions. Here, we describe 
the interfaces that are necessary for integrating user-defined plugins into SPOT. 

5.1 Initialization Plugins 

The default plugin for generating an initial design is init. design. func = 
"spotCreateDesignLhs". It uses information about the size of the initial design 
init .design, size. The number n of design variables Xi (i — I, . . . ,n), their 
names pNames, and their ranges ai < Xi < bi can be determined with spot's 
internal alg.roi variable, which is passed to the initialization plugin. 

> pNames <- row. names (alg. roi) ; 

> a <- alg. roi [ ,"low"]; 

> b <- alg.roi [ ,"high"]; 
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Note, pNames, a, and b are vectors of size n. Based on this information, a data 
frame with initial design points is generated. For example, the data frame from 
the demo07RcLndomForestS2ain project reads: 

TEMP TMAX 

1 35.608154 20.519330 

2 3.030746 30.929965 

3 35.043096 11.698179 

4 18.713224 49.808201 

5 13.996489 35.314852 

6 26.265450 25.781717 

7 24.104926 3.675120 

8 7.341344 25.289776 

9 49.035177 42.562579 

10 42.543436 7.346474 

These values are written to the initial design file, see Listing [3j The reader is 
referred to the spotCreateDesingLhs function for further details. 

The plugin spotCreateDesignFrF2 generates a central composite design 
and can be used as a template for fractional factorial design plugins. Currently, 
SPOT implements the following init plugins: 

• spotCreateBasicDoe3R: creates a fractional-factorial design (resolution 
III) 

• spotCreateFrF2: creates a resolution III design with center point and 
star points 

• spotCreateLhs: creates a Latin hypercube design 

Note, these plugins should be used as templates and can be easily adopted to 
specific situations. 



5.2 Run Plugins 



The run plugin spotAlgStartScLnn, which is used as an interface to R's SANN 
algorithm, is shown in the Appendix, see Listing [8j Basically, the user has to 
specify variable names to be read from the design file, see Sect. 5.2.1 and written 



to the result file, see Sect. 5.2.3 and the call of the algorithm, see Sect. 5.2.2 



5.2.1 Reading Values Prom the Design File 

To add a new variable, say COLOR, the user simply adds the following line of 
code to the run file: 

if (is. element ("COLOR" , pNames)) {color <- des$COLOR[k] } 
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5.2.2 Executing the Algorithms 

Next, the call of the algorithm has to be specified. In our example, 
y <- optim(xO, spotFunctionBranin, method="SANN" , 

control=list(maxit=maLxit, temp=temp, tmax=tmax, parscale=parscale, 
color=color) ) 



5.2.3 Writing Results to the Result File 

And finally, in order to write the variable to the result file, it has to be added 
to the following list: 

res <- list(Y = y, TEMP = temp, TMAX = tmax, 
COLOR = color, SEED = seed, CONFIG = conf) 



5.2.4 Interfacing With Algorithms Written in Other Programming 
Languages 

We will demonstrate how JAVA programs can be called from SPOT. The pro- 
cedure consists of two steps: First, a call string is build. Then, R's system 
function is used for executing the callString. 

callString <- pasteCjava -jar simpleOnePlusOneES . jar" , 

seed, steps, target, f, n, xpO, sigmaO, a, g, px, py, sep = " ") 

y <-system( callString, intern= TRUE) 

This procedure can be applied to any optimization algorithm. Templates for 
state-of-the-art optimization algorithms will be added to forthcoming SPOT ver- 
sion. Users are encouraged to submit interfaces to their algorithms to the SPOT 
development team. 

Currently, SPOT implements the following run plugins: 

• spotAlgStartSann: Interface to R's simulated annealing SANN 



spotAlgStartES : Interface to an ES based on Beyer and Schwefel (2002 1 
spotFuncStartBrainin: Interface to the Branin function. SPOT is used as 



an optimizer, not as a tuner, see also Sect. 6.4 
Additional run packages are available, e.g., 
> demo(spotDemollJava, ask=FALSE) 

demonstrates how a (1+1)-ES, which is implemented in Java, can be tuned with 

SPOT. 
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5.3 Sequential Plugins 



During spot's sequential step one or several meta models are generated. These 
models use information from the result file. New, promising design points are 
generated. Therefore, a large number of randomly generated design points are 
evaluated on the meta model. Configurations with the best estimated objective 
function values are written to the design file and will be evaluated during the 
run step, see line [12] in Algorithm [l] 

SPOT provides two types of data assembled from the result file: Raw data 
comprehend parameter values x (configurations) and related objective function 
values y, whereas merged data map same Xi configurations to one configuration 
Xj. The corresponding yi values are merged according to the merge function 
(default: mean), e.g., yj = X^i ^i/"- '^^^ example, the random forest is 
generated with raw data. R's generic predict function is used to evaluate new 
data on the meta model (random forest). Finally, the best design points are 
determined. 

The random forest meta model is implemented as shown in Listing [7| 
Listing 7: spotPredictRandomForest.R 

spotPredictRandomForest <— function (rawB , mergodB , largcDcsign , 

spotConfig ) { 
spotlnstAndLo ad Packages (" randomForest ") 
xNames <— s e t d i ff ( names ( rawB ) ,"y") 
X <— rawB [ ,xNames] ; y <— rawBSy 
fit <— randomForest (x , y) 
res <— predict(fit , largeDesign ) 

largeDesign <— largeDesign [ order ( res , decreasing=FALSE) ,] 
newDesign <— largeDesign [1: spotConfigSseq . design . new .size , ] } 

Currrently (June 2010), SPOT provides interfaces to the following meta modeling 
approaches: 

• Regression models (im; rsm): 

1. spotPredictLm 

2. spotPredictLmOptim 

• Tree based models (tree; randomForest) 

1. spotPredictTree 

2. spotPredictRandomForest 

• Gaussian process models (mlegp; tgp) 

1. spotPredictTgp 

2. spotPredictMlegp 

The Appendix presents an example (Listing [9]) how several meta models can be 
combined. Interfaces to further meta models will be provided in future releases 
of the SPOT package. 
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5.4 Report Plugins 



SPOT comes with a simple report plugin spotReportDef ault .R. It reports the 
best configuration from the tuning procedure, illustrates the tuning process 
(evolution of the best solution as shown in Figs. [3] and 13), and generates a 
simple regression tree as shown in Fig. [9j 

User defined report functions can easily be added. Wolfgang Konen has 
written a report plugin which uses randomForest to visualize factor effects, 
see Fig. [TOj Figure [TT] demonstrates how EDA tools can be applied to analyse 
effects and interactions. Note, results from the result file can be used for detailed 
reports. At this stage, EDA tools are the method of choice. 



6 Refinement 

6.1 Combining Meta Models And Adaptation of the Re- 
gion of Interest 

During the sequential step, SPOT can use different meta models. The follow- 
ing example demonstrates how results from tree based regression and response 
surface modeling can be combined. 



6.1.1 Designs 

A central composite design (CCD) was chosen as the starting point of the tuning 
process, spot's spotCreateDesignFrF2 plugin can be used to generate design 
points. After the first run is finished, we can use spot's report facility to analyze 
results. Since we have chosen a classical factorial design, we will use response- 



surface methodology (RSM).[Lenth (2009) describes an implementation of RSM 



in R. This R package rsm has many useful tools for an analysis of the results from 
the spot runs. After evaluating the algorithm in these design points, a seccond 
order regression model with interactions is fitted to the data. Functions from 
the rsm package were used by the SPOT plugin spotPredictLmOptim. Before 
meta models are build, data are standardized. Data in the original units are 
mapped to coded data, i.e., data with values in the interval [—1, 1]. 



6.1.2 Response Surface Models 

Based on the number of design points, SPOT automatically determines whether 
a first-order, two-way interaction, pure quadratic, or second order model can be 
fitted to the data. The CCD generated by spotCreateDesignFrF2 allows the 
fit of an second-order model which can be summarized as follows. 

Coefficients : 

Estimate Std. Error t value Pr(>|t|) 
(Intercept) 0.93070 0.46212 2.014 0.1375 
xl 2.29074 0.36382 6.296 0.0081 ** 

x2 -1.98286 0.36307 -5.461 0.0121 * 
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TEMP <> 13.670513660355 



® 



TMAX X 19.5 



0.4528 
62 obs 



@ 

1 .8467 
12 obs 



TEMP <> 31 .4690639535547 



® 



2.6678 
12 obs 



5.1795 
8 obs 



Figure 9: Tuning SANN with SPOT. An rsm and tree based approach are 
combined. Similar to the random forest based meta modehng, TEMP has the 
largest effect 
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Figure 10: Results from the SANN tuning procedure. Function values Y plotted 
versus parameter values. rsLndomForest was used to predict values for one 
variable, say temp, while the other variable (tmax) was set its optimal value. 
Values for both variables were normalized. This plot was generated with the 
spotReportSens plugin 
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Figure 11: This figure shows an EDA example taken from |Bartz-Beielstein et aTT] 
(20101. Contour plots based on 82 function evaluations of the CMA evolution 



strategy (CMA-ES) optimizing the Ackley function are shown (Hansen 2006). 
Smaller values are better. Better configurations are placed in the lower area of 
the panels. The CMA-ES has four algorithm parameters (CS, NU, DAMPS, 
and NPARENTS). The parameter CS is held constant. NU is plotted versus 
DAMPS, while values of the parameter NPARENTS (np), are varied with the 
slider on top of each panel 
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x1 = (TEMP - 25.5)/24.5 



Figure 12: Response surface model based on the initial design, rsm was used to 
generate this plot 

xl:x2 -2.29498 0.40674 -5.642 0.0110 * 

xl-2 1.86950 0.87244 2.143 0.1215 

x2-2 -0.05832 0.87337 -0.067 0.9510 

Residual standard error: 0.8135 on 3 degrees of freedom 
Multiple R-squared: 0.9734, Adjusted R-squared: 0.9291 

F-statistic: 21.97 on 5 and 3 DF, p-value : 0.01437 

6.1.3 Using Gradient Information 

The response surface analysis determines the following stationary point on re- 
sponse surface: (—0.8447783,-0.3781626), or, in the original units temp = 
4.802932 and tmax = 16.235016. The eigenanalysis shows that the eigenval- 
ues (Ai = 2.4042085; A2 = —0.5930265) have different signs, so this is a saddle 
point, as can also be seen in Fig. [12] SPOT automatically determines the path 
of the steepest descent and selects five points, using the old center point as a 
starting point, in its direction: (23.7605, 27.215), (22.315, 29.224), (21.4085, 
31.6005), (21.139, 34.271), and (21.4085, 37.0395), i.e., decreasing temp and in- 
creasing tmax values are chosen. Rather than at the origin, SPOT can start the 
search at the saddle point. Set seq.useCanonicalPath = TRUE to enable this 
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feature. In this case, SPOT determines the most steeply rising ridge in both 
directions, see also Lenth (2009) for details: 
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In addition to the points from the steepest descent, the best point from the first 
design (1,50) is evaluated again. Now, these points are evaluated and a new rsm 
model is build. 



6.1.4 Automatic Adaptation of the Region of Interest 

SPOT modifies the region of interest, if seq.useAdaptiveRoi = TRUE. This pro- 
cedure consists of two phases, which are repeated in an alternating manner. 

During the orientation phase, the direction of the largest improvement is 
determined as described in Sect. |6.1.3l Based on an existing design and related 
function values, the path of the steepest descent is determined. A small number 
of points is chosen from this path. Optimization runs are performed on these 
design points. In some situations, where no gradient information is available, 
the best point from a large number of design points, which were evaluated on 
the regression model, is chosen as the set of improvement points. 

The recalibration phase determines the best point Xf,. It can be selected 
from the complete set of evaluated design points or from the points along the 
steepest descent only. The best point defines the new center point of a central 
composite design. The minimal distance of to the borders of the actual region 
of interest defines the radius of this design. If is located at (or very close 
to) the borders of the region of interest, a Latin hypercube design which covers 
the whole region of interest is determined. This can be interpreted as a restart. 
To prevent premature convergence of this procedure, one additional new design 
point is generated by a tree based model. 

Next, the orientation phase is repeated. The tuning process with adaptive 
ROT is visualized in Fig. |13[ The final output from this tuning process, which 
is based on regression models and tree based regression reads: 

Best solution found with 94 evaluations: 

Y TEMP TMAX COUNT CONFIG 
0.4006016 116 2 



As in Sect. 2.6 ten repeats of the best solution from this tuning process are 



generated. Results are shown in Fig. 14 Note, this result was found with only 



half of the number of SANN runs compared to the random forest modeling 
approach from Sect. |2.6[ The example demonstrates how the usage of gradient 
information can accelerate the tuning procedure. 
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Eval: 94, Y: 0.400601606392209 






Figure 13: Tuning SANN with SPOT. An rsm and tree based approach are 
combined 
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Figure 14: Comparison of SANN's rantiom forest tuned parameter values with 
parameter settings obtained with rsm (spotLm) 



6.2 Numerical and Categorical Values 



SPOT provides mechanisms for handUng type information. Categorical values 
such as "red", "green", and "blue" have to be coded as integer values, e.g., "1", 
"2", and "3" in the ROI file. By default, they are treated as numerical values 
(float) . They can be treated as factors, if the corresponding type information 
(factor) is provided in the type column of the ROI file. Alternatively, the type 
INT can be specified in the ROI file. These parameters are treated as numerical 
values, but the spotCreateDesign plugins generate integer values which are 
written to the design files. Bartz-Beielstein et al. (2010) presents an example 



which illustrates the usage of type information in SPOT. 



6.3 Meta Projects 

SPOT allows the definition of meta projects. Meta project perform tuning over 
a set of problem instances. One interesting task is to analyze interactions be- 
tween the search-space dimension, say d, and the best algorithm design p* . For 
example, the experimenter can search for dependencies between population size 
in ES and d. For a detailed documentation the reader is referred to the package 
help manuals. 



6.4 SPOT as an Optimization Algorithm 

SPOT itself can be used as an optimization algorithm. The package includes 
some demos to illustrate this feature. For example, spotDemoLmOSBranin uses 
a linear (meta) model to optimize Branin's function. 
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7 Summary and Outlook 



This article present basic features of the SPOT package which is implemented 
in R. SPOT provides tools for automatic and interactive tuning of algorithms. 
Categorical and numerical parameters can be used as input variables, which are 
specified in the ROI file. A configuration file (CONF) collects data related to 
the SPOT run (which is considered as a project) such as the prediction model. 
The reader is referred to the SpotGetOptions help page, which lists spot's 
configuration parameters. 

Parameters related to the algorithm or the optimization problem are stored 
in the APD file. SPOT generates simple text files which are used as interfaces 
to the algorithm. 

The sequential approach comprehends the following steps: 

• init: generate an initial design 

• run: evaluate the algorithm 

• seq: generate new design points (meta model) 

• rep: statistical analysis and visualization, EDA 

Plugins for these steps are subject of on-going research. SPOT can also be run in 
a meta mode to perform tuning over a set of problem instances. Plugin devel- 
opment concentrates on combining predictions from several regression models, 
integrating tools for multi objective optimization, and performing meta SPOT 
runs. 

The SPOT package contains several demos, which can be used as starting 
points for setting up your own SPOT project. Use 

> demo(spotDemo07RandomForestSann, ask=FALSE) 

to start a demo which is related to the experimental setup from Sect. |2.6| 

8 Acknowledgements 

This work was supported by the Bundesministerium ftir Bildung und Forschung 
(BMBF) under the grants FIWA (AIF FKZ 17N2309, "Ingenieurnachwuchs") 
and SOMA (AIF FKZ 17N1009, "Ingenieurnachwuchs") and by the Cologne 
University of Applied Sciences under the research focus grant COSA. 
Many thanks go to members of the FIWA and SOMA research group. 

9 Appendix 

9.1 R Source Code for Starting SANN 
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Listing 8: spotAlgStartSann.R 



spot AlgStartSann <— function ( io . apdFileName , io . desFileName , io . 
resFileName ) { 

writeLines ( paste (" Loading design file data from::", io . 

desFileName), con=stderr () ) ; 
source ( io . apdFileName , local=TRUE) 

des <— read, table ( io . desFileName , sep=" ", header = TRUE) ; 
pNames <— names (des); 
config <— nrow ( des ) ; 
for (k in l:config){ 

for (i in 1 : desSREPEATS [ k ] ) { 

if ( is . element ("TEMP" , pNames)){ 
temp <- dcs$TEMP[k] 

} 

if ( is . element ("TVIAK" , pNames)){ 
tmax <- round(des$TMAX[k] ) 

} 

c o n f <— k 

if ( is . element ("CONFIG" , pNames)){ 
conf <- des$CONFIG[k] 

} 

spotStep<-NA 

if ( is . element ("STEP" , pNames)){ 
spotStep <- des$STEP[k] 

} 

seed <- desSSEED [k] + i-l 
set . seed ( seed ) 

y <— optim(xO, spotFuncStartBraninSann , method='SANN" , 

cont rol=l i s t ( maxit=maxit , temp=temp , tmax=tmax , parscale= 
parscale ) ) 
res <- NULL 

res <- list (Y=y$ value , TEMP=tcmp , TMA?&=tmax , FUNCTION=f , DIM= 

n, SEEDtsccd , CONFIG=conf) 
if ( is . element ("STEP" , pNames)){ 
res=c(res ,STEP=spotStep ) 

} 

res <— data . frame ( res ) 
colNames = TRUE 

if (file.exists(io. resFileName ) ) { 
colNames = FALSE 

} 

write . table ( res , file = io . resFileName , row . names = FALSE, 
col. names = colNames, sep = " ", append = ! colNames , 
quote = FALSE) ; 
colNames = FALSE 

} 

} 



9.2 R Source Code for Combining Meta Models 

Listing 9: spotPredictRandomForestMlegp.R 



spot Inst AndLoadPackages (" mlegp ") 
xNames <— set d iff (names (rawB) ,"y") 
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X <— rawB [ , xNames ] 
y <— rawBSy 

rf . fit <— randomForcst ( X , y) 

r f . r e s predict (rf. fit ,largcDcsigii) 

rf . largcDcsign <— largcDcsign [order (rf. res , decreasing=FALSE) ,] 
rf.s <— round ( spot CoiifigSseq . design . new . size /2 ) 
mlegp . s <— spo t C onfigS seq . design . new . s iz e — rf.s 
rf . largeDesign <— rf . largeDesign [ 1 ; rf . s , ] 
if (mlegp. s> 0){ 

mlegp. fit <— mlegp (x, y) 

mlegp. res <— predict ( mlegp . fit .largeDesign) 

mlegp . largeDesign <— largeDesign [ order ( rf . res , decreasing=FALSE) 
,] 

mlegp . largeDesign <— largeDesign [ 1 : mlegp . s , ] 
return (rbind(rf. largeDesign , mlegp . largeDesign) ) 

} 

else{ return ( rf . largeDesign ) } 

writeLines (" spotPredictRandomForestMlegp finished") ; 
return (largeDesign) 



9.3 R Source Code for the Compcirison of SANN Param- 
eter Settings 

First, we will set the seed to obtain reproducible results. 

> set. seed Cl) 

Next, we will define the objective function. 

> spotFunctionBranin <- function(x) { 
+ xl <- x[l] 

+ x2 <- x[2] 

+ (x2 - 5.1/(4 * pi'2) * (xl~2) + 5/pi * xl - 6)~2 + 10 * (1 - 

+ 1/(8 * pi)) * cos(xl) + 10 

+ } 

Then, the starting point for the optimization xq and the number of function 
evaluations maxit are defined: 

> xO <- c(10, 10) 

> maxit <- 250 

The parameters specified so far belong to the problem design. Now we have to 
consider parameters from the algorithm design, i.e., parameters that control the 
behavior of the SANN algorithm, namely tmax and temp: 

> tjnax <- 10 

> temp <- 10 

Finally, we can start the optimization algorithm (SANN): 
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> yl <- NULL 

> for (i in 1:10) { 



+ set.seed(i) 

+ yl <- c(yl, optimCxO, spotFunctionBrajiin, method = "SANN", 
+ control = listCmaxit = maxit, temp = temp, 

+ tmax = tmax))$value) 



+ } 

> summary (yl) 

Min. 1st Qu. Median Mean 3rd Qu. Max. 
0.3995 0.4037 0.4174 0.9716 0.6577 4.0670 

> temp <- 1.283295 

> tmax <- 41 

> y2 <- NULL 

> for a in 1:10) { 



+ set.seed(i) 

+ y2 <- c(y2, optim(xO , spotFunctionBranin, method = "SANN", 
+ control = listCmaxit = maxit, temp = temp, 

+ tmax = tmax))$value) 



+ } 

> summary (y2) 

Min. 1st Qu. Median Mean 3rd Qu. Max. 
0.3981 0.3999 0.4007 0.4018 0.4035 0.4085 

> temp <- 0.1 

> tmax <- 1 

> y3 <- NULL 

> for a in 1:10) { 



+ set.seed(i) 

+ yS <- c(y3, optim(xO , spotFunctionBranin, method = "SANN", 
+ control = listCmaxit = maxit, temp = temp, 

+ tmax = tmax))$value) 



+ } 

> summary Cy 3) 

Min. 1st Qu. Median Mean 3rd Qu. Max. 
0.3980 0.3982 0.3984 0.3995 0.3989 0.4047 
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